# Tested Software

- Stata 17
- R. 4.2.2

For the required packages for R, please refer to the `DESCRIPTION`.
For Stata, please install these packages from ssc:
- distinct
- did_imputation
- ppmlhdfe

# Setup

Create the folder `output`. Then create the folders `tables`, `figures` and `numbers` inside of it.

Execute scripts in ascending order.

# Datasets

## `citation_panel.csv`

Yearly citation counts to treated papers (= written by alleged perpetrators) and a random selection of research-type articles from the same issue or volume.

Variables:
- `eid`: The ID of the document.
- `treated`: Whether the document was written by a perpetrator or not.
- `num_auth`: The number of authors.
- `pub_year`: The publication year.
- `sample`: The ID of the sample.
- `source_id`: The ID of the source (journal, conference proceeding).
- `source_asjc`: The field (ASJC-2) of the source.
- `source_bin`: The percentile of the source in the SCImago journal rank distribution (annual, if available).
- `incident_id`: The incident ID (coincides with ASM ID for all-numeric IDs) for the first incident of a perpetator.
- `sample_article_num`: The ID of the document within the sample.
- `citation_year`: The year in which the citation is counted.
- `cit_count`: The number of citations in that year.
- `cit_count_female_first_author`: The number of citations by papers with a female lead author in that year.
- `cit_count_male_first_author`: The number of citations by papers with a male lead author in that year.
- `cit_count_soc_dist_1`: The number of citations by papers with the closest author at distance 1 in that year.
- `cit_count_soc_dist_2`: The number of citations by papers with the closest author at distance 2 in that year.
- `cit_count_soc_dist_3`: The number of citations by papers with the closest author at distance 3 or higher in that year.
- `cit_count_female_soc_dist_1`: The number of citations by papers with a female lead author and the closest female author at distance 1 in that year.
- `cit_count_female_soc_dist_2`: The number of citations by papers with a female lead author and the closest female author at distance 2 in that year.
- `cit_count_female_soc_dist_3`: The number of citations by papers with a female lead author and the closest female author at distance 3 or higher in that year.
- `cit_count_male_soc_dist_1`: The number of citations by papers with a male lead author and with the closest author at distance 1 in that year.
- `cit_count_male_soc_dist_2`: The number of citations by papers with a male lead author and with the closest author at distance 2 in that year.
- `cit_count_male_soc_dist_3`: The number of citations by papers with a male lead author and with the closest author at distance 3 or higher in that year.
- `incident_year`: The year of the incident, if known.
- `outcome_year`: The year the incident was resolved.
- `lexisnexis`: Whether LexisNexis covers the case.
- `earliest_lexisnexis_year`: The first year LexisNexis has reports for.
- `big_newspaper`: Whether a big (nation-wide) newspaper reports on the incident.
- `local_newspaper`: Whether a local newspaper reports on the incident.
- `affiliation`: The Scopus ID of the affiliation of the author.
- `outc_leaver`: Whether the outcome was resolved by the perpetrator leaving academia.
- `Hard_Science`: Whether the document belongs to the hard sciences.
- `Social_Science`: Whether the document belongs to the social sciences.
- `Rest_Science`: Whether the document belongs to neither the hard nor the social sciences.
- `share_asjc_male_dom`: The share of ASJC-2 fields associated with the source where the share of actively publishing females is below 20% in 2000.


## `scientist_panel.csv`

Yearly observations of alleged perpetrators to matched control scientists.

Variables:
- `pub_year`: The year.
- `treated`: Whether the observation belongs to a perpetrator or not.
- `author_first_pubyear`: The author's year of first publication (as per Scopus).
- `pub_count`: The publication stock in that year.
- `pub_wcount`: The stock SCImago Journal Rank Indicator-weighted publications in that year.
- `num_coauthors`: The stock of unique authors in that year.
- `num_female_coauthors`: The stock of unique female authors in that year.
- `num_old_coauthors`: The stock of unique male authors in that year.
- `num_old_female_coauthors`: The stock of unique female authors in that year that were already coauthors before the incident.
- `incident`: The incident ID (coincides with ASM ID for all-numeric IDs) for the first incident of a perpetator.
- `outcome_year`: The year the incident was resolved.
